Modeling Prosodic Structures in Linguistically Enriched Environments
نویسندگان
چکیده
A significant challenge in Text-to-Speech (TtS) synthesis is the formulation of the prosodic structures (phrase breaks, pitch accents, phrase accents and boundary tones) of utterances. The prediction of these elements robustly relies on the accuracy and the quality of error-prone linguistic procedures, such as the identification of the part-of-speech and the syntactic tree. Additional linguistic factors, such as rhetorical relations, improve the naturalness of the prosody, but are hard to extract from plain texts. In this work, we are proposing a method to generate enhanced prosodic events for TtS by utilizing accurate, error-free and high-level linguistic information. We are also presenting an appropriate XML annotation scheme to encode syntax, grammar, new or given information, phrase subject/object information, as well as rhetorical elements. These linguistically enriched has have been utilized to build realistic machine learning models for the prediction of the prosodic structures in terms of segmental information and ToBI marks. The methodology has been applied by exploiting a Natural Language Generator (NLG) system. The trained models have been built using classification via regression trees and the results strongly indicate the realistic effect on the generated prosody. The evaluation of this approach has been made by comparing the models produced by the enriched documents to those produced by plain text of the same domain. The results show an improved accuracy of up to 23%.
منابع مشابه
Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora
Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accura...
متن کاملProsody Prediction from Linguistically Enriched Documents Based on a Machine Learning Approach
One of the main aspects in text-to-speech synthesis is the successful prediction of prosodic events. In this work we deal with the prediction of prosodic phrase breaks, accent tones and boundary tones from a linguistically XML-based enriched input (SOLE-ML) produced by a Natural Language Generator (NLG) system. We first extended the original specification of SOLE-ML in order for the NLG to prod...
متن کاملPerceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis
Prosody is an important factor in the quality of text-tospeech (TTS) synthesis. Typically, acoustic parameters such as f0 and duration are the only variables related to prosody that are used to determine unit selection. Our study explored adding the explicit use of linguistically and perceptually motivated prosodic categories in unit selection-based TTS. One of our goals was to automate the pro...
متن کاملThe psi/phi architecture for prosodic parsing
In this paper an architecture and an implementation for a linguistically based prosodic analyser is presented. The implementation is designed to handle typical prosodic input in the form of parallel input channels, and processes each input channel independently in a data-directed, phonologically motivated configuration of partly parallel, partly cascaded feature modules and module clusters, eac...
متن کاملProsodically Enriched Text Annotation for High Quality Speech Synthesis
Linguistically enriched text generated from natural language modules contributes significantly on the quality of speech synthesis. For all cases where such modules are not available, such enriched input needs to be produced from plain text in order to maintain quality. This work reports on a framework of several combined language resources and procedures (word/sentence identification, syntactic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004